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Abstract 

Language models for speech recognition typ- 
ically use a probability model of the form 

Stochastic grammars; 



r(Qvt | «i)Q2) ■ ■ ■ tQn-i, 



n the other hand, are typically used to as - 

sign structure to utterances. A language model 
of the above form is constructed from such 
grammars by computing the prefix probabil- 
ity J2wgt,* P r (oi ' ' ' CL n w), where w represents 
all possible terminations of the prefix a\ ■ ■ ■ a n . 
The main result in this paper is an algorithm 
to compute such prefix probabilities given a 
stochastic Tree Adjoining Grammar (TAG). 
The algorithm achieves the required computa- 
tion in C(n 6 ) time. The probability of sub- 
derivations that do not derive any words in the 
prefix, but contribute structurally to its deriva- 
tion, are precomputed to achieve termination. 
This algorithm enables existing corpus-based es- 
timation techniques for stochastic TAGs to be 
used for language modelling. 

1 Introduction 

Given some word sequence ai ■ ■ ■ a n -i, speech 
recognition language models are used to hy- 
pothesize the next word a n , which could be 
any word from the vocabulary S. This 
is typically done using a probability model 
Pr(a n \ai, . . . , a n _i). Based on the assumption 
that modelling the hidden structure of nat- 
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ural language would improve performance of 
such language models, some researchers tried to 
use stochastic context-free grammars (CFGs) to 
prod uce language models ( Wright and Wrigley. 



T989| |Jelinek and Lafferty, 199H ptolcke, 199% 
The probability model used for a stochas- 
tic grammar was S^gs* Pr(ai • • • a n w)- How- 
ever, language models that are based on tri- 
gram probability models out-perform stochastic 
CFGs. The common wisdom about this failure 
of CFGs is that trigram models are lexicalized 
models while CFGs are not. 

Tree Adjoining Grammars (TAGs) are impor- 
tant in this respect since they are easily lexical- 
ized while capturing the constituent structure 
of language. More importantly, TAGs allow 
greater linguistic expressiveness. The trees as- 
sociated with words can be used to encode argu- 
ment and adjunct relations in various syntactic 
environments. This paper assumes some famil- 
iarity with the TAG formalism. (Joshi, 1988) 
and (Joshi and Schabes, 1992) are good intro- 
ductions to the formalism and its linguistic rele- 
vance. TAGs have been shown to have relations 
with both phrase-str ucture grammars and de - 
pendency grammars ( Rambow and Joshi, 1995 ), 
which is relevant because recent work on struc- 



tured language models (|Chelbaet al., 1997|) have 
used dependency grammars to exploit their lex- 
icalization. We use stochastic TAGs as such a 
structured language model in contrast with ear- 
lier work where TAGs have been exp loited in 
a cla ss-based n-gram language model ( |5rinivas~ 



1996|) . 



This paper derives an algorithm to compute 
prefix probabilities X™eE* P r (°i ■ ■ ■ o-nw). The 
algorithm assumes as input a stochastic TAG G 
and a string which is a prefix of some string in 
L(G), the language generated by G. This algo- 
rithm enabl es existing corp us-based estimation 
techniques ( Schabes, 1992 ) in stochastic TAGs 
to be used for language modelling. 



2 Notation 

A stochastic Tree Adjoining Grammar (STAG) 
is represented by a tuple (NT, £,I, A, 4>) where 
NT is a set of nonterminal symbols, E is a set 
of terminal symbols, X is a set of initial trees 
and A is a set of auxiliary trees. Trees in ZuA 
are also called elementary trees. 

We refer to the root of an elementary tree t as 
Rt- Each auxiliary tree has exactly one distin- 
guished leaf, which is called the foot. We refer 
to the foot of an auxiliary tree t as Ft- We let 
V denote the set of all nodes in the elementary 
trees. 

For each leaf N in an elementary tree, except 
when it is a foot, we define label(N) to be the 
label of the node, which is either a terminal from 
£ or the empty string e. For each other node 
N, label (N) is an element from NT. 

At a node N in a tree such that label (N) G 
NT an operation called adjunction can be ap- 
plied, which excises the tree at N and inserts 
an auxiliary tree. 

Function assigns a probability to each ad- 
junction. The probability of adjunction of t G A 
at node iV is denoted by <p(t, N). The probabil- 
ity that at iV no adjunction is applied is denoted 
by 0(nil,iV). We assume that each STAG G 
that we consider is proper. That is, for each 
N such that label (N) G NT, 

tSAJ{nil} 

For each non-leaf nodeJV we construct the 
string cdn(N) = Ni ■ ■ ■ N m from the (ordered) 
list of children nodes N\, . . . ,N m by defining, 

for each d such that 1 < d < m, N d = label (Nd) 
in case label (N d ) G £ U {e}, and N d = N d oth- 
erwise. In other words, children nodes are re- 
placed by their labels unless the labels are non- 
terminal symbols. 

To simplify the exposition, we assume an ad- 
ditional node for each auxiliary tree t, which 
we denote by _L. This is the unique child of the 
actual foot node Ft. That is, we change the def- 
inition of cdn such that cdn(Ft) = _L for each 
auxiliary tree t. We set 

V ± = {N eV \ label(N) G NT} U £ U {J_}. 

We use symbols a, b, c, ... to range over £, 
symbols v,w,x,... to range over £*, sym- 
bols N,M,... to range over V^, and symbols 



a, (3, 7, . . . to range over (V- 1 )*. We use t, t', . . . 
to denote trees in X U A or subtrees thereof. 

We define the predicate dft on elements from 
V x as dft(N) if and only if (i) N G V and N 
dominates _L, or (ii) N = _L. We extend dft 
to strings of the form N\ . . . N m G (V^)* by 
defining dft(N\ . . . N m ) if and only if there is a 
d (1 < d < to) such that dft(N d ). 

For some logical expression p, we define 
6(p) = 1 iff p is true, 5(p) = otherwise. 

3 Overview 

The approach we adopt in the next section to 
derive a method for the computation of prefix 
probabilities for TAGs is based on transforma- 
tions of equations. Here we informally discuss 
the general ideas underlying equation transfor- 
mations. 

Let w = a%a2 ■ ■ ■ a n G £* be a string and let 
N G V L . We use the following representation 
which is standard in tabular methods for TAG 
parsing. An item is a tuple [N, i, j, fx, fj\ rep- 
resenting the set of all trees t such that (i) t is a 
subtree rooted at iV of some derived elementary 
tree; and (ii) t's root spans from position i to 
position j in w, t's foot node spans from posi- 
tion f\ to position fi in w. In case N does not 
dominate the foot, we set fx = fi = — ■ We gen- 
eralize in the obvious way to items [t, i, j, fx , /b] , 
where t is an elementary tree, and [a,i, j, fx , fj\ , 
where cdn(N) = a/3 for some N and (3. 

To introduce our approach, let us start with 
some considerations concerning the TAG pars- 
ing problem. When parsing w with a TAG G, 
one usually composes items in order to con- 
struct new items spanning a larger portion of 
the input string. Assume there are instances of 
auxiliary trees t and t' in G, where the yield of 
t', apart from its foot, is the empty string. If 
4>(t, N) > for some node N on the spine of t', 
and we have recognized an item [R t ,i, j, fx, f2], 
then we may adjoin t at N and hence deduce 
the existence of an item [Rt',i, j, fx, $2] (see 
Fig. |(a)). Similarly, if t can be adjoined at 
a node N to the left of the spine of t' and 
fi = /2j we may deduce the existence of an item 
[Rt',i,j,j,j] (see Fig. 0(b)). Importantly, one 
or more other auxiliary trees with empty yield 
could wrap the tree t' before t adjoins. Adjunc- 
tions in this situation are potentially nontermi- 
nating. 

One may argue that situations where auxil- 
iary trees have empty yield do not occur in prac- 
tice, and are even by definition excluded in the 




spine 



Figure 1: Wrapping in auxiliary trees with 
empty yield 



case of lexicalized TAGs. However, in the com- 
putation of the prefix probability we must take 
into account trees with non-empty yield which 
behave like trees with empty yield because their 
lexical nodes fall to the right of the right bound- 
ary of the prefix string. For example, the two 
cases previously considered in Fig. |l| now gen- 
eralize to those in Fig. 0. 



spine 





Figure 2: Wrapping of auxiliary trees when 
computing the prefix probability 



To derive a method for the computation of 
prefix probabilities, we give some simple recur- 
sive equations. Each equation decomposes an 
item into other items in all possible ways, in 
the sense that it expresses the probability of 
that item function of the probabilities of 
items associated with equal or smaller portions 
of the input. 

In specifying the equations, we exploit tech- 
niqu es used in t he parsing of incomplete in- 
put ( Lang, 1988|) . This allows us to compute 



In order to avoid the problem of nontermi- 
nation outlined above, we transform our equa- 
tions to remove infinite recursion, while preserv- 
ing the correctness of the probability computa- 
tion. The transformation of the equations is 
explained as follows. For an item /, the span 
of /, written o~(I), is the 4-tuple representing 
the 4 input positions in /. We will define an 
equivalence relation on spans that relates to the 
portion of the input that is covered. The trans- 
formations that we apply to our equations pro- 
duce two new sets of equations. The first set 
of equations are concerned with all possible de- 
compositions of a given item / into set of items 
of which one has a span equivalent to that of / 
and the others have an empty span. Equations 
in this set represent endless recursion. The sys- 
tem of all such equations can be solved indepen- 
dently of the actual input w. This is done once 
for a given grammar. 

The second set of equations have the property 
that, when evaluated, recursion always termi- 
nates. The evaluation of these equations com- 
putes the probability of the input string modulo 
the computation of some parts of the derivation 
that do not contribute to the input itself. Com- 
bination of the second set of equations with the 
solutions obtained from the first set allows the 
effective computation of the prefix probability. 

4 Computing Prefix Probabilities 

This section develops an algorithm for the com- 
putation of prefix probabilities for stochastic 
TAGs. 

4.1 General equations 

The prefix probability is given by: 



^2 Pr ( a i • • • a n w) 



]TP(M,n,-,-]), 



the prefix probability as a by-product of com- 
puting the inside probability. 



where P is a function over items recursively de- 
fined as follows: 

P([t,i,j,fi,f 2 }) =P([R t ,i,jJij2}); (1) 
P([aN,i,j,-,-]) = (2) 

J2 P([a,i,k,-,-]) -P([N, k, j, -,-]), 

k(i<k < j) 

if a ^eA^dft(aN); 
P([aN,i,jJiJ 2 ]) = (3) 
J2 P(la,i,k,--])'P([N,k,j,f 1 ,f 2 ]), 

k(i <k< /i) 

if aj^eA dft(N); 



P([aN,i,j,f 1} f 2 ]) = (4) 

E P([a,i,k,h,h]) ■ P([N,k,j,--]), 
k(h <k<j) 

if a 7^ e A dft(a); 

P([N,hj,h>f2]) = (5) 
<Knil,AO-P(Mn(AO,^,/i,/ 2 ]) + 

E ^(M^AOXAJiiAD • 

fi,fi(i<fi<h/\f2<f^<j) 

E <Kt, N)-P([t,i,j,f[,f 2 }), 
teA 

if N € V Adft(N); 
P([N,i,j,-,-]) = (6) 
</»(nil,iV).P([ C dn(iV),i,j,-,-]) + 

£ P(Mn(iV),/{,/2,- -D • 
fi </2 <i) 

E>fo2V)-P([t,i,j, A, A]), 
t<=A 

ifNe V A -idft(N); 
P([a,i,j,-,-}) = (7) 
+ 1 = j A flj = a) + (5(i = j = n); 

P([±,i,j,fi,M) =S(i = fiAj = f 2 ); (8) 
P([e,i,j,- -]) = *(t = j). (9) 

Term P( [t, i, j, /i , / 2 ] ) gives the inside probabil- 
ity of all possible trees derived from elementary 
tree t, having the indicated span over the input. 
This is decomposed into the contribution of each 
single node of t in equations ([[]) through (|6|). 
In equations (g) and (||) the contribution of a 
node N is determined by the combination of 
the inside probabilities of iV's children and by 
all possible adjunctions at N. In (|7]) we rec- 
ognize some terminal symbol if it occurs in the 
prefix, or ignore its contribution to the span if it 
occurs after the last symbol of the prefix. Cru- 
cially, this step allows us to reduce the compu- 
tation of prefix probabilities to the computation 
of inside probabilities. 

4.2 Terminating equations 

In general, the recursive equations (||) to @ 
are not directly computable. This is because 
the value of P([A, i,j, f, /']) might indirectly de- 
pend on itself, giving rise to nontermination. 
We therefore rewrite the equations. 

We define an equivalence relation over spans, 
that expresses when two items are associated 
with equivalent portions of the input: 

(i',f, f{ , ft) ~ («', j, h , h) if and onl y if 



((/{,/£) = (A, / 2 )V 

((/( =ti=iVf[=f 2 = jVft =f 2 = -)A 

(A = A = *v A = h = J v A = h = -))) 

We introduce two new functions P low and 
Psput- When evaluated on some item I, P low re- 
cursively calls itself as long as some other item 
I' with a given elementary tree as its first com- 
ponent can be reached, such that a{I) ks cr(J'). 
P low returns if the actual branch of recursion 
cannot eventually reach such an item thus 
removing the contribution to the prefix proba- 
bility of that branch. If item I' is reached, then 
P low switches to P spUt . Complementary to P ioun 
function P spU t tries to decompose an argument 
item / into items I' such that cr(I) 96 cr(I'). If 
this is not possible through the actual branch 
of recursion, P sp u t returns 0. If decomposition 
is indeed possible, then we start again with P low 
at items produced by the decomposition. The 
effect of this intermixing of function calls is the 
simulation of the original function P, with P low 
being called only on potentially nonterminating 
parts of the computation, and P spUt being called 
on parts that are guaranteed to terminate. 

Consider some derivation tree spanning some 
portion of the input string, and the associated 
derivation tree r. There must be a unique ele- 
mentary tree which is represented by a node in 
r that is the "lowest" one that entirely spans 
the portion of the input of interest. (This node 
might be the root of r itself.) Then, for each 
t G A and for each i,j,fi,f 2 such that i < j 
and i < f\ < f 2 < j, we must have: 

P([t,i,j,h,f 2 }) = (10) 

E PU[t,iJJi,f2], [t',f[,f 2 ]). 

Similarly, for each t € T and for each i,j such 
that i < j, we must have: 

P([M,j,- -]) = (ii) 

E PU[t,i,3,--i [t',f,f]). 
f'6{t}uA/e{-i,j} 

The reason why P low keeps a record of indices 
/{ and f 2 , i.e., the spanning of the foot node 
of the lowest tree (in the above sense) on which 
P [ow is called, will become clear later, when we 
introduce equations (p9|) and (|30|), 

We define P lovi ([t, i,j, /1, A]7f> f[, A]) and 

Pi ™([cM,i, A, A] 5 A) AD for i < j and 
(*,j,A,A) ~ ih3,R,f2), as follows. 



P, ow ([t, i, j, fi, hi [t',f{, &]) = (12) 
P low (\R t , i, j, ft, f 2 ], + 
6((t,h,f 2 ) = ■ 
P sp m([Rt, i, j, h, /2D; 
P low ([aN,i,j,-,-}, [t,f{,&]) = (13) 
P low ([a,i,j,-,-], [t,f{,f 2 ]) ■ 

P([N,j,j,-,-}) + 
P([a,i,i,-,-]) ■ 

P low ([N,i,j,-,-}, [t,fi,&]), 
if a / eA^dft(aN); 
P low {[aN,i,jJ u f 2 ], [t,f{,&]) = (14) 
5(h=j)-P low ([a,i,j,-,-}, [tJiJA) ■ 

P([N,j,j,hJ 2 }) + 
P([a,i,i,-,-]) • 

p l o W mi,j,fij 2 ], [t,A,&\\ 

if a 7^ e A dft(N); 
P low {[aN,i,j,h,h], [t,f[J' 2 ]) = (15) 
Piow([a,i,j,fi,f2], [t,f{, f 2 ]) • 

P([NJ,3-, -]) + 
5(t = /2)-P([a,* ) i ) /i,/2]) • 

^([^i.J,- -], 
if a 7^ e A dft(a); 

P low (W,i,j,fi,f 2 ], [tJiJA) = (16) 
0(nil,JV) • 

P Jou ,([cdn(JV),i,j,/i,/ 2 ], [*,/{,/£]) + 

£, e ^,2V)-P([^i,j,i,j]) + 
P([cdn(N),f 1 J 2 ,f u f2]) ■ 

Y,<i>W,N)-P low {\i!,i,j,f u f 2 }, [t,f[,f' 2 ]), 
t'eA 

if N £ V A dft(N); 

P l0W (W,i,j,-,-], [tJUm = (17) 
cp(nil,N) ■ 

P low ([cdn(N),i,j,-,-\, [t, fiJfi) + 
P low ([cdn(N),i,j,-,-}, [t,f{,&]) ■ 

Z t , eA <l>(t',N)-P([t',iJ,iJ}) + 

£ P(Hn(iV),/f,^,-,-]) . 

/■// /•/// /■// /•// ■ w rtt rtt \ 

fi J2U1 = h = % v A = /a - 3) 

J2<t>(t'^yp low ([t',i,jj^m, [t,f{,&\), 

t'eA 

if N e 7 A^dft(N); 
P low ([a,i,j,-,-\, [tj[,m) = 0; (18) 



Pw([e,^,j,-,-], [*,/{,/£])=<). 



(19) 
(20) 



The definition of P low parallels the one of P 
given in §4.1. In (|i~2|), the second term in the 
right-hand side accounts for the case in which 
the tree we are visiting is the "lowest" one on 
which P low should be called. Note how in the 
above equations P low must be called also on 
nodes that do not dominate the footnode of the 
elementary tree they belong to (cf. the definition 
of «). Since no call to P apUt is possible through 
the terms in fll8|), ( |l~9| ) and (|20|), we must set 
the right-hand side of these equations to 0. 

The specification of P 3pUt ([a, ft, f 2 ]) is 
given below. Again, the definition parallels the 
one of P given in §4.1. 



P spm ([aN,i,j,-,-}) = (21) 
P([a,i,k,-,-])-P([N,k,j,-,-]) + 

k(i < k < j) 

P spHt ([a,i, j, -,-])■ P([N,j,j,-,-]) + 
P([a, i, i, -, -]) • P spm ([N, -, -]), 
if a ^ e A ^dft(aN); 
PsMlaNJJJuh}) = (22) 

P([a,i,k,--])-P([N,k,j,f 1 ,f 2 ]) + 

fe(« < fc < fl A k < j) 

&(fi = j) ■ Ps P m([a,i,j, -,-]) • 

P([N,j,jJi,f 2 \) + 
P([a,i,i, -,-]) • P sp ut([N,i,j,f 1 ,f 2 }), 
if a / e A dft(N); 
PsMiaN^JJufr]) = (23) 
P([a,i,k,fx,f 2 }) -P([N,k,j,-,-]) + 

fc(i <kAf 2 <k<j) 

Ps P m([a,i,jJi,f 2 ])-P([N,j,j,-,-]) + 
= / 2 )-P([a,i,i,/i,/ 2 ])- 
P JP Ht([A r ,«,j, -,-]), 
if a 7^ e A dft(a); 

p spm mhjjij2]) = (24) 

0(ml,JV) •P VI «([cdn(iV),i,i,/ 1 ,/ 2 ]) + 
P([cdn(JV),/i,/2,/i 5 /2]) • 

(»</!</lA/2<^<jA 

J2Ht,N).P([t,i,j,f[J^]) + 
te«4 

-^split 

([cdn(N),i,j,f u f 2 ]) ■ 

J2<t>(t,N)-P([t,i,j,i,j]), 
teA 



if N € VAdft(N); 
P sp m([N,i,j,-,-}) = (25) 
(j){nil,N) ■ P spHt {[cdn(N),i,j,-,-]) + 

P([cdn(N),f[, &-,-]) ■ 

f{,fi (<</{</£<JA(/i,/£)?4(i,j)A 
= ft = i V /( = ft = j)) 

]>>(t,A0-P([M,i,ii,/2]) + 

te.4 

^ P ;«t([cdn(A^),i,j, -,-]) • 

-P([M, .7, m])> 

if TV G V A^dft(N); 

Ps P ut([a,i,j, -, -]) = 6(i + 1 = j A a,- = a); (26) 

F sp!it ([±,i J i,/ 1 ,/ 2 ]) = 0; (27) 

P vKt ([e,t,i,-,-]) = 0. (28) 

We can now separate those branches of re- 
cursion that terminate on the given input from 
the cases of endless recursion. We assume be- 
low that P spm ([Rt,i, j, fx,fQ) > 0. Even if this 
is not always valid, for the purpose of deriving 
the equations below, this assumption does not 
lead to invalid results. We define a new function 
Pouter, which accounts for probabilities of sub- 
derivations that do not derive any words in the 
prefix, but contribute structurally to its deriva- 
tion: 

P ut er ([t,hj,fi,f2}, [t'JiJi]) = (29) 

p low ([t,hj,fi,f2], [t'jim 

Ps P m([R t ',iJJ[J2}) 

Pouter{[a,i,jJl,f2], [t'j'ljA) = (30) 

P low ([a^j,h,h], [t'jjjl,]) 

P spm ([Rt>,i,j,f[,f2]) 

We can now eliminate the infinite recur- 
sion that arises in (|l^) and ( ]TT| ) by rewriting 
P([t, i,j, /i, f 2 ]) in terms of P outer : 

P([t,i,jJi,f 2 }) = (31) 

Pouter{[t,i,j,fl,f2], [t' , f'l, fQ) ■ 

t' EA/;,/5((i,J,/i,/D si (w'./l./s)) 

Ps P m([Rt',i,j,f{,f 2 })i 

P([t,i,j,-,-]) = (32) 

Pouter ( \Pi — , — ], [t )/)/]) ' 

t' e{t}uA,f e{-,i,j} 

P sp ut([Rt>,i,3, /,/])• 

Equations for P outer will be derived in the next 
subsection. 



which replace ([[]), along 
to (P) and all the equations 



In summary, terminating computation of pre 
fix probabilities should be based on equa- 
tions ( |3l| ) and fR2T). which renlace 
with equations 
for P sp ut- 

4.3 Off-line Equations 

In this section we derive equations for function 
Pouter introduced in §4.2 and deal with all re- 
maining cases of equations that cause infinite 
recursion. 

In some cases, function P can be computed 
independently of the actual input. For any 
i < n we can consistently define the following 
quantities, where t E X U A and a G V 1 - or 
cdn(N) = a/3 for some iV and f3: 
H t = P([t,i,i,f,f]); 
H a = P([a,i,i,f',f']), 
where / = i if t 6 A, / = — otherwise, and /' = 
i if dft(a), f = — otherwise. Thus, Ht is the 
probability of all derived trees obtained from t, 
with no lexical node at their yields. Quantities 
H t and H a can be computed by means of a sys- 
tem of equations which can be directly obtained 
from equations (Q) to Similar quantities as 
above must be introduced for the case i = n. 
For instance, we can set H[ = P([t,n,n, f, /]), 
/ specified as above, which gives the probabil- 
ity of all derived trees obtained from t (with no 
restriction at their yields). 

Function P outer is also independent of the 
actual input. Let us focus here on the case 
fi, h £ {h3, -} ( this enforces (/1, f 2 ) = (f[,f 2 ) 
below). For any i,j,fi,f 2 < n, we can consis- 
tently define the following quantities. 

L t , t > = Pouter([t,i,j,flj2], [t',f{, f^]); 
L a ,t> = Pouter([a,i,j,fl,f 2 ], [t',f[,f 2 ])- 

In the case at hand, Ltt' is the probability of all 
derived trees obtained from t such that (i) no 
lexical node is found at their yields; and (ii) at 
some 'unfinished' node dominating the foot of 
t, the probability of the adjunction of t' has al- 
ready been accounted for, but t 1 itself has not 
been adjoined. 

It is straightforward to establish a system of 
equations for the computation of L t t > and L a t i , 
by rewriting equations ( jP2] ) to ( p0| ) according 
to (| 29|) and (|30"D. For instance, combining (12) 



and (|29D gives(using the above assumptions on 
/1 and f 2 ): 

Lt,t> = L Rttt > + S(t = t'). 
Also, if a / e and dft(N), combining ( |l4|) 
and (|^) gives (again, using previous assump- 



tions on f\ and f 2 ] note that the H^s are known 
terms here): 

L a N,t> = H a ■ Ln,V ■ 

For any i,fi,f2 < n and j = n, we also need to 
define: 

L' tit , = P outar ([t,i,n,f 1 ,f 2 }, [t',f[,m 

L 'a,t> = Pout £r ([a,i,n,f 1 ,f 2 ], ft, f [,&])■ 
Here L' t t , is the probability of all derived trees 
obtained from t with a node dominating the 
foot node of t, that is an adjunction site for t' 
and is 'unfinished' in the same sense as above, 
and with lexical nodes only in the portion of 
the tree to the right of that node. When we 
drop our assumption on f\ and f 2 , we must 
(pre) compute in addition terms of the form 
Pouter h Ji ii i] 5 srid P outer ({t, i, j, 

[t',j,j]) for i < j < n, P outer ([t,i,n,fi,n], 
[t'Jvfz]) for i < fi < n, P outer ([t,i,n,n,n], 
[t', f[, f'2]) for i < n, and similar. Again, these 
are independent of the choice of i, j and f\. Full 
treatment is omitted due to length restrictions. 

5 Complexity and concluding 
remarks 

We have presented a method for the computa- 
tion of the prefix probability when the underly- 
ing model is a Tree Adjoining Grammar. Func- 
tion P spHt is the core of the method. Its equa- 
tions can be directly translated into an effective 
algorithm, using standard functional memoiza- 
tion or other tabular techniques. It is easy to 
see that such an algorithm can be made to run 
in time 0(n e ), where n is the length of the input 
prefix. 

All the quantities introduced in §[D^ (H t , 
L tt i, etc.) are independent of the input and 
should be computed off-line, using the system of 
equations that can be derived as indicated. For 
quantities H t we have a non-linear system, since 
equations (2) to (6) contain quadratic terms. 
Solutions can then be approximated to any de- 
gree of precision using standard iterati ve meth 



ods, as for instance those exploited in QStolcke, 



1995 ). Under the hypothesis that the grammar 
is consistent, that is Pr(L(G)) = 1, all quanti- 
ties H' t and H' a evaluate to one. For quantities 
L t> t' and the like, § |4.3| provides linear systems 
whose solutions can easily be obtained using 
standard methods. Note also that quantities 
L ai i are only used in the off-line computation 
of quantities L t)t i , they do not need to be stored 
for the computation of prefix probabilities (com- 
pare equations for L t ^ with ([H]) and (|32|)). 



We can easily develop implementations of our 
method that can compute prefix probabilities 
incrementally. That is, after we have computed 
the prefix probability for a prefix a\ ■ ■ ■ a n , on in- 
put a n _|_i we can extend the calculation to prefix 
a\ ■ ■ ■ a n a n+ i without having to recompute all 
intermediate steps that do not depend on a n+ i. 
This step takes time 0(n 5 ). 

In this paper we have assumed that the pa- 
rameters of the stochastic TAG have been pre- 
viously estimated. In practice, smoothing to 
avoid sparse data problems plays an important 
role. Smoothing can be handled for prefix prob- 
ability computation in the following ways. Dis- 
counting methods for smoothing simply pro- 
duce a modified STAG model which is then 
treated as input to the prefix probability com- 
putation. Smoothing using methods such as 
deleted interpolation which combine class-based 
models with word-based models to avoid sparse 
data problems have to be handled by a cognate 
interpolation of prefix probability models. 
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